W-kmeans: Clustering News Articles Using WordNet
نویسندگان
چکیده
Document clustering is a powerful technique that has been widely used for organizing data into smaller and manageable information kernels. Several approaches have been proposed suffering however from problems like synonymy, ambiguity and lack of a descriptive content marking of the generated clusters. We are proposing the enhancement of standard kmeans algorithm using the external knowledge from WordNet hypernyms in a twofold manner: enriching the “bag of words” used prior to the clustering process and assisting the label generation procedure following it. Our experimentation revealed a significant improvement over standard kmeans for a corpus of news articles derived from major news portals. Moreover, the cluster labeling process generates useful and of high quality cluster tags.
منابع مشابه
Assisting cluster coherency via n-grams and clustering as a tool to deal with the new user problem
Collaborative filtering systems typically need to acquire some data about the new user in order to start making personalized suggestions, a situation commonly referred to as the ‘‘new user problem’’. In this work we attempt to address the new user problem via a unique personalized strategy for prompting the user with articles to rate. Our approach makes use of hypernyms extracted from the WordN...
متن کاملA clustering technique for news articles using WordNet
Please cite this article in press as: C. Bouras, V dx.doi.org/10.1016/j.knosys.2012.06.015 The Web is overcrowded with news articles, an overwhelming information source both with its amount and diversity. Document clustering is a powerful technique that has been widely used for organizing data into smaller and manageable information kernels. Several approaches have been proposed which, however,...
متن کاملEnhancing News Articles Clustering using Word N-Grams
In this work we explore the possible enhancement of the document clustering results, and in particular clustering of news articles from the web, when using word-based n-grams during the keyword extraction phase. We present and evaluate a weighting approach that combines clustering of news articles derived from the web using n-grams, extracted from the articles at an offline stage. We compared t...
متن کاملImproving news articles recommendations via user clustering
Although commonly only item clustering is suggested by Web mining techniques for news articles recommendation systems, one of the various tasks of personalized recommendation is categorization of Web users. With the rapid explosion of online news articles, predicting user-browsing behavior using collaborative filtering (CF) techniques has gained much attention in the web personalization area. H...
متن کاملUser Personalization via W-kmeans
With the rapid explosion of online news articles, predicting userbrowsing behavior using collaborative filtering techniques has gained much attention in the web personalization area. However, common collaborative filtering techniques suffer from low accuracy and performance. This research proposes a new personalized recommendation approach that integrates user and text clustering based on our d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010